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DETAILED ACTION 



Claim Objections 

1 . Claims 9-17, 29, 37, and 38 are objected to because of the following 
informalities: 

• the term "a complex speech model" is introduced in claim 9, line 1; thus, 
the indefinite article "a" should be removed from the subsequent reference 
complex speech model in claim 9, line 20. 

• the term "a complex speech model" in introduced in claim 28, line 2; thus, 
the indefinte article "a" should be removed from the subsequence reference 
complex speech model in claim 29, line 19. 

Appropriate correction is required. 



The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in a patent granted on an application for patent by another filed in the 
United States before the invention thereof by the applicant for patent, or on an international application 
by another who has fulfilled the requirements of paragraphs (1), (2), and (4) of section 371(c) of this 
title before the invention thereof by the applicant for patent. 

The changes made to 35 U.S.C. 102(e) by the American Inventors Protection Act 
of 1999 (AIPA) do not apply to the examination of this application as the application 
being examined was not (1 ) filed on or after November 29, 2000, or (2) voluntarily 



Claim Rejections - 35 USC § 102 
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published under 35 U.S.C. 122(b). Therefore, this application is examined under 35 
U.S.C. 102(e) prior to the amendment by the AIPA (pre-AlPA 35 U.S.C. 102(e)). 



2. Claims 1-8, 18-27,35, and 36 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Kuhn et al., (U.S. Patent 6,327,565), hereinafter referred to as Kuhn. 

Regarding claims 1, 7 and 35, Kuhn discloses a system for speaker adaptation 
that includes the following: a new speaker input (col. 5, lines 26-39, Fig. 3, 40), which 
corresponds to "an input for receiving an input signal derived from a spoken utterance 
that contains at least one speech element that potentially matches the given speech 
element"; a set of HMMs 44 (one for each sound) (col. 5, lines 23-34), which 
corresponds to "a model group associated to the given speech element, said model 
group comprising a plurality of speech models, each speech model of said plurality of 
speech models being a different representation of the given speech element"; with an 
inherent processing unit to generate an adapted model based in the input and using a 
linear combination of coefficients (Fig. 3 52 col. 5, lines 50-57), which corresponds to "a 
processing unit coupled to the input for processing the input signal and the model group 
to generate a hybrid speech model associated to the given speech element, said hybrid 
speech model being weighted a combination of speech models in said plurality of 
speech models effected on the basis of the input signal derived from the spoken 
utterance"; and adaptation occurs during recognition with the inherent output of the 
recognition result from the recognizer (col. 2, lines 45-50), which corresponds to "an 
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output for releasing a signal indicative of said hybrid speech model associated to the 
given speech element in a format suitable for use by a speech recognition device." 

Regarding claim 2, Kuhn teaches everything claimed, as applied above (see 
claim 1); in addition, Kuhn teaches the modeling of speech units (such as a phrase, 
word, subword, phoneme or the like) (col. 3, lines 4-7), which corresponds to "the given 
speech element is an element selected from the group consisting of phones, diphones, 
syllables and words." 

Regarding claim 3, Kuhn teaches everything claimed, as applied above (see 
claim 2); in addition, Kuhn teaches the training of a speaker dependent model (one for 
each sound unit) (col. 5, line 30-33), which corresponds to "input signal derived from a 
spoken utterance is indicative of a speaker specific speech model associated to the at 
least one speech element." 

Regarding claim 4 t Kuhn teaches everything claimed, as applied above (see 
claim 3); in addition, Kuhn teaches the creation of a new speaker dependent model (col. 
5 t lines 22-57), which corresponds to "hybrid speech model is weighted toward the 
speaker specific speech model." 

Regarding claim 5, Kuhn teaches everything claimed, as applied above (see 
claim 4); in addition, Kuhn teaches that the speaker dependent model serves to 
estimate the linear combination of coefficients that will comprise the adapted model 44 
(col. 5, 50-57), which corresponds to "said hybrid speech model is derived by computing 
a linear combination of the speech models in said model group." 
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Regarding claim 6, Kuhn teaches everything claimed, as applied above (see 
claim 5). In addition, Kuhn teaches the following: the use of an iterative process where 
multiple speaker dependent inputs can be used during training (col. 5, lines 22-58, in 
particular lines 54-58), which corresponds to "a first input and wherein said input signal 
is a first input signal, said apparatus further comprising: a) a second input for receiving a 
second input signal conveying a data element identifying the given speech element"; 
speaker dependent and speaker independent HMMs (models) for each sound unit (col. 
4, lines 40-46), which corresponds to "b) a database of model groups comprising a 
plurality of model groups, each model group being associated to a respective speech 
element, each model group comprising a set of speech models"; and the construction of 
a new model for the for a given sound (in the supervisor mode) (col. 5, lines 22-58, in 
particular lines 33-36), which corresponds to "said processing unit being further 
operative for extracting from said database of model groups a certain model group 
associated to the data element received at said second input identifying the given 
speech element." 

Regarding claim 8, Kuhn discloses an algorithm (Fig. 3) for speaker adaptation 
inherently implemented with a computational unit that includes the following: a new 
speaker input (col. 5, lines 26-39, Fig. 3, 40), which corresponds to "an input for 
receiving an input signal derived from a spoken utterance that contains at least one 
speech element that potentially matches the given speech element"; a set of HMM's 44 
(one for each sound) inherently stored in a memory (col. 5, lines 23-34), which 
corresponds to "a memory unit for storing a model group associated to the given speech 
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element, said model group comprising a plurality of speech models, each speech model 
of said plurality of speech models being a different representation of the given speech 
element"; with an inherent processing unit to generate an adapted model based in the 
input and estimating a linear combination of coefficients to comprise the adapted model 
(Fig. 3 52 col. 5, lines 50-57), which corresponds to "a processing unit coupled to the 
input for processing the input signal and the model group to generate a hybrid speech 
model associated to the given speech element, said hybrid speech model being a 
weighted combination of speech models in said plurality of speech models effected on 
the basis of the input signal derived from the spoken utterance"; the adaptation of the 
models during recognition (col. 2, lines 45-50) with the inherent release of the 
recognition result, which corresponds to "an output for releasing a signal indicative of 
said hybrid speech model associated to the given speech element in a format suitable 
for use by a speech recognition device." 

Regarding claim 18, Kuhn teaches techniques and algorithms for speaker 
adaptation based on eigenvoices using multiple model groups (Fig. 3, 44 48 52) where 
the model groups contain HMMs that can be used to represent phrases, words, 
subwords, or phonemes (col. 3, lines 4-8) and inherently implemented on a 
computational device, which corresponds to "a first data structure for storing a plurality 
of model groups, each model group being associated to a respective speech element in 
a phonetic alphabet, each model group comprising a plurality of speech models, each 
model group being suitable for use by a processing device"; and an adapted speech 
model derived from the speaker dependent model based on an estimate of the linear 
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combination of coefficients (col. 5, lines 25-57) and inherently implemented on a 
computational device, which corresponds to "a second data structure for storing a 
hybrid speech model associated to a given speech element, said hybrid speech model 
being a weighted combination of speech models of the first type in said plurality of 
speech models effected on the basis of an input signal derived from a spoken utterance 
that contains at least one speech element that potentially matches the given speech 
element." 

Regarding claim 19, Kuhn teaches everything claimed, as applied above (see 
claim 18); in addition, Kuhn teaches that the speech units can be phrases, words, 
subwords or phonemes or the like (col. 3, lines 4-7), which corresponds to "the given 
speech element is indicative of a data element selected from the set consisting of 
phones, diphones, syllables and words." 

Regarding claim 20, Kuhn teaches everything claimed, as applied above (see 
claim 19); in addition, Kuhn teaches the use of a data structure with multiple model 
groups with the adapted model being the most highly processed (Fig. 3, 44 48 52, col. 
5, lines 23-58), which corresponds to "wherein each model group comprises two sets of 
speech models namely a first set having a plurality of speech models of a first type and 
a second set having a plurality of speech models of a second type, each speech model 
of a first type in said first set being associated to a speech model of the second type in 
the second set, each speech model of the second type being indicative of a speech 
model having a higher complexity than a speech model of the first type to which the 
speech model of the second type is associated." 
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Regarding claim 21 and 36, Kuhn discloses an adaptable speech recognition 
system (col. 2, lines 45-50) with the following features: a speaker input (Fig. 3 40) which 
when in supervised mode knows the content of an input in advance (col. 5, lines 33-36), 
which corresponds to "an input for receiving an input signal indicative of a spoken 
utterance that is indicative of at least one speech element"; procedures for training of a 
speaker dependent recognizer (Fig. 3 42 44, col. 5, lines 39-49), which corresponds to 
"a first processing unit coupled to said input operative for processing the input signal to 
derive from a speech recognition dictionary at least one speech model associated to a 
given speech element that constitutes a potential match to the at least one speech 
element"; procedures to construct an adapted model based on a supervector (Fig. 3 46 
48 38, col. 5, lines 50-57), which corresponds to "a second processing unit coupled to 
said first processing unit for generating, using a predefined weighting constraint, a 
modified version of the at least one speech model or the basis of the input signal"; and 
procedures to construct a new set of HMMs based on the supervector (Fig. 3, 50 52), 
which corresponds to "a third processing unit coupled to said second processing unit for 
processing the input signal on the basis of the modified version of the at least one 
speech model to generate a recognition result indicative of whether the modified version 
of the at least one speech model constitutes a match to the input signal"; and since 
Kuhn's system is a recognition system that automatically adapts during recognition (col. 
2, lines 45-50), Kuhn's system inherently releases recognition results, which 
corresponds to "an output for releasing a signal indicative of the recognition result." 
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Regarding claim 22, Kuhn teaches everything claimed, as applied above (see 
claim 21); in addition, Kuhn teaches the processing of the input to produce a speaker 
dependent model of a specific input when in the supervised mode (Fig. 3 40 42 44, col. 
5, 39-49), which corresponds to "wherein said first processing unit is operative for 
generating a speaker specific speech model derived on the basis of the input signal, the 
speaker specific speech model being indicative of the acoustic characteristics of the 
least one speech element." 

Regarding claim 23, Kuhn teaches everything claimed, as applied above (see 
claim 22); in addition Kuhn teaches that the speaker dependent model 44 serves to 
estimate the linear combination of coefficients that will comprise the adapted model (col. 
5, lines 50-55), which corresponds to "said modified version of the at least one speech 
model is indicative of a hybrid speech model associated to the given speech element." 

Regarding claim 24, Kuhn teaches everything claimed, as applied above (see 
claim 23). In addition, Kuhn teaches the following: the transfer of data between the 
procedures (indicated by the arrows in Fig. 3 between elements 42 44 and 46), which 
corresponds to "coupling member for allowing data exchange between the first 
processing unit and the second processing unit, said coupling member being suitable 
for receiving the speaker specific speech model derived from the input signal"; a model 
group containing HMMs with models of speech sounds (Fig. 3, col. 5, lines 26-39), 
which corresponds to "a model group associated to the given speech element, said 
model group comprising a plurality of speech models, each speech model of said 
plurality of speech models being a different representation of the given speech 
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element"; a procedure to construct a new set of HMMs based on the supervector and 
using a linear combination of coefficients (Fig. 3 50, lines 50-57), which corresponds to 
"a functional unit coupled to the coupling member for processing the speaker specific 
speech model and the model group to generate the hybrid speech model associated to 
the given speech element, said hybrid speech model being a weighted combination of 
speech models in said plurality of speech models effected on the basis of the speaker 
specific speech model"; and since Kuhn discloses a speech recognition system (col. 2, 
lines 45-50) it has an inherent means for indicating a particular recognition event on 
output, which corresponds to "an output coupling member for allowing data exchange 
between the second processing unit and the third processing unit, said output coupling 
member being suitable for releasing a signal indicative of the hybrid speech model 
associated to the given speech element." 

Regarding claim 25, Kuhn teaches everything claimed, as applied above (see 
claim 24); in addition, Kuhn teaches that the dependent model 44 serves to estimate the 
linear combination of coefficients that will comprise the adapted model (col. 5, lines 50- 
54), which corresponds to "said hybrid speech model is weighted toward the speaker 
specific speech model." 

Regarding claim 26, Kuhn teaches everything claimed, as applied above (see 
claim 24); in addition, Kuhn teaches that the dependent model 44 serves to estimate the 
linear combination of coefficients that will comprise the adapted model (col. 5, lines 50- 
54), which corresponds to "said hybrid speech model is derived by computing a linear 
combination of the speech models in said group of speech models." 
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Regarding claim 27, Kuhn teaches everything claimed, as applied above (see 
claim 24); in addition, Kuhn teaches the following: transfer of data between the speaker 
dependent building portion of the model and the adapted model (indicated by arrows in 
Fig. 3), which corresponds to "a) a second coupling member for allowing data exchange 
between the first processing unit and the second processing unit, said second coupling 
member being suitable for receiving a data element identifying the given speech 
element"; groups of models (Fig. 3 44 48 52) containing HMMs of speech sounds (col. 
3, lines 3-7, col. 5, lines 50-57), which corresponds to "b) a database of model groups 
comprising a plurality of model groups, each model group being associated to a 
respective speech element, each model group comprising a set of speech models"; a 
procedure 50 for constructing a new set of HMMs based on the supervector 48 that can 
access the models with the adapted model (Fig. 3 50 52), which corresponds to 
"functional unit being further operative for extracting from said database of model 
groups a certain model group associated to the data element received at said second 
coupling member identifying the given speech element." 



Allowable Subject Matter 

Claims 9-17, 37, 38 are objected to due to minor informalities, but would be 
allowable if corrected (see §1 ). It is noted that the closest prior art of record, Kuhn et al. 
(US Patent 6,327,565) does not teach the generation of a complex speech model that is 
a combination of speech models of the second type in said plurality of speech models. 
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Claims 28-34 are objected to due to minor informalities (see §1) and as being 
dependent upon a rejected base claim, but would be allowable if corrected and rewritten 
in independent form including all of the limitations of the base claim and any intervening 
claims. It is noted that the closest prior art of record, Kuhn et al. (US Patent 6,327,565) 
does not teach the generation of a complex speech model that is a combination of 
speech models of the second type in said plurality of speech models. 

Response to Arguments 

3. See Applicant's arguments from page 22 through page 24; in particular that 
statement on page 23, line 14, that "linear combinations are NOT necessary [sic] 
weighted combinations." 

As stated in the previous response (paper 10, page 19), Kuhn asserts (col. 5, Ins. 50- 
53) that the speaker dependent model serves to estimate the linear combination of 
coefficients (weighted combination) that will comprise the adapted model (hybrid speech 
model). It is well-known in the art that a linear combination is a sum (or difference) of 
elements with coefficients where the coefficients are real numbers that scale the 
elements (i.e., a weighted combination). This is interpretation supported by the 
Applicant's statement in the specification where "[t]he linear combination is 
characterized by a set of parameters indicative of weights associated to speech models 
..." (page 14 lines 27-30), which the Examiner maintains supports the interpretation that 
"a weighted combination" speech models is equivalent to the notion of "a linear 
combination" of speech models as taught by Kuhn. 
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Conclusion 



THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any response to this office action should be mailed to: 

Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

or faxed to: 

(703) 872-9314 

Hand-delivered responses should be brought to: 

Crystal Park II 

2121 Crystal Drive 

Arlington, VA. 

Sixth Floor (Receptionist) 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Dr. V. Paul Harper whose telephone number is (703) 
305-4197. The examiner can normally be reached on Monday through Friday from 8:00 
a.m. to 4:30 p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil, can be reached on (703) 305-9645. The fax phone 
number for the Technology Center 2600 is (703) 872-9314. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the Technology Center 2600 Customer Service office 
whose telephone number is (703) 306-0377. 




VPH/vph 
March 22, 2004 




